49 research outputs found

    Humanities Data in R

    Get PDF

    Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of R×C tables

    Get PDF
    Acknowledgments. The authors wish to thank Carolina Plescia for providing us with the electoral Scottish data handled in this paper and two anonymous reviewers and the editors for their really valuable comments and suggestions. We are grateful to M. Hodkinson for translating and revising the English of the paper and Priscila Espinosa for her tips about LATEX. This research has been supported by Conseller'ıa d'Innovacio, Universitats, Ciència i Societat Digital, Generalitat Valenciana [grant number AICO/2021/257] and by the Ministerio de Economía e Innovacion [grant number PID2021-128228NB-I00].This paper assesses the two current major alternatives for ecological inference, based on a multinomial-Dirichlet Bayesian model and on mathematical programming. Their performance is evaluated in a database made up of almost 2000 real datasets for which the actual cross-distributions are known. The analysis reveals both approaches as complementarity, each one of them performing better in a different area of the simplex space, although with Bayesian solutions deteriorating when the amount of information is scarce. After offering some guidelines regarding the appropriate contexts for employing each one of the algorithms, we conclude with some ideas for exploiting their complementarities

    Introducing migratory flows in life table construction

    Get PDF
    The purpose of life tables is to describe the mortality behav iour of particular groups. The construction of general life tables is based on death statis tics and census figures of resident populations under the hypothesis of closed demographic sys tem. Among other assumptions, this hypothesis implicitly assumes that entries (immigrants) a nd exits (emigrants) of the population are usually not significant (being almost of the same magnitu de for each age compensating each other). This paper theoretically extends the classical sol ution to open demographic systems and studies the impact of this hypothesis in constructing a life table. In particular, using the data of residential variations made available to the public by the S panish National Statistical Office (INE, Instituto Nacional de Estad ́ ıstica) to approximate migratory flows, we introduce in the p rocess of constructing a life table these flows and compare, before and after graduation, the crude mortality rates and the adjusted death probabilities obtained when mi gratory flows are, and are not, taken into accountPeer Reviewe

    Introducing migratory flows in life table construction

    Get PDF
    The purpose of life tables is to describe the mortality behaviour of particular groups. The construction of general life tables is based on death statistics and census figures of resident populations under the hypothesis of closed demographic system. Among other assumptions, this hypothesis implicitly assumes that entries (immigrants) and exits (emigrants) of the population are usually not significant (being almost of the same magnitude for each age compensating each other). This paper theoretically extends the classical solution to open demographic systems and studies the impact of this hypothesis in constructing a life table. In particular, using the data of residential variations made available to the public by the Spanish National Statistical Office (INE, Instituto Nacional de Estadística) to approximate migratory flows, we introduce in the process of constructing a life table these flows and compare, before and after graduation, the crude mortality rates and the adjusted death probabilities obtained when migratory flows are, and are not, taken into account

    Field rules and bias in random surveys with quota samples : an assessment of CIS surveys

    Get PDF
    Surveys applying quota sampling in their final step are widely used in opinion and market research all over the world. This is also the case in Spain, where the surveys carried out by CIS (a public institution for sociological research supported by the government) have become a point of reference. The rules used by CIS to select individuals within quotas, however, could be improved as they lead to biases in age distributions. Analysing more than 545,000 responses collected in the 220 monthly barometers conducted between 1997 and 2016 by CIS, we compare the empirical distributions of the barometers with the expected distributions from the sample design and/or target populations. Among other results, we find, as a consequence of the rules used, significant overrepresentations in the observed proportions of respondents with ages equal to the minimum and maximum of each quota (age and gender group). Furthermore, in line with previous literature, we also note a significant overrepresentation of ages ending in zero. After offering simple solutions to avoid all these biases, we discuss some of their consequences for modelling and inference and about limitations and potentialities of CIS data

    Dasymetric distribution of votes in a dense city

    Full text link
    [EN] A large proportion of electoral analyses using geography are performed on a small area basis, such as polling units. Unfortunately, polling units are frequently redrawn, provoking breaks in their data series. Previous electoral results play a key role in many analyses. They are used by political party workers and journalists to present quick assessments of outcomes, by political scientists and electoral geographers to perform detailed scrutinizes and by pollsters and forecasters to anticipate electoral results. In this paper, we study to what extent more complex geographical approaches (based on a proper location of electors on the territory using dasymetric techniques) are of value in comparison to simple methods (like areal weighting) for the problem of reallocating votes in a large, dense city. Barcelona is such a city and, having recently redrawn the boundaries of its census sections, it is an ideal candidate for further scrutiny. Although previous studies show the approaches based on dasymetric techniques outperforming simpler solutions for interpolating census figures, our results show that improvements in the process of reallocating votes are marginal. This brings into question the extra effort that entails introducing ancillary sources of information in a dense urban area for this kind of data. Additional research is required to know whether and when these results are extendable. (C) 2017 Elsevier Ltd. All rights reserved.This work was supported by the Spanish Ministry of Economics and Competitiveness under Grant CSO2013-43054-R.Pavia, JM.; Cantarino-Martí, I. (2017). Dasymetric distribution of votes in a dense city. Applied Geography. 86:22-31. https://doi.org/10.1016/j.apgeog.2017.06.021S22318

    On the relationship between knowledge creation and economic performance

    Get PDF
    An empirical two-equation dynamic panel-data model system with fixed effects is proposed to analyze the relationship between knowledge creation and economic performance across regions over time. Estimates of the model for Spanish regions show that (i) knowledge creation depends on local R&D effort, on the amount of knowledge in use, and on knowledge creation in neighboring regions; and (ii) assimilation of new knowledge depends on local knowledge creation and on assimilation of knowledge in neighboring regions. Both processes include region-specific context fixed effects and region-specific time effects, representing region-specific dynamic influences. The results imply that (a) efficiency gains at regional level may be achieved by investing locally in the creation of new knowledge, either technological or organizational; (b) creation of knowledge in a region may be promoted by using greater amounts of already existing knowledge, as well as by increasing local R&D effort; (c) both knowledge creation and knowledge assimilation spread to/from neighboring regions; and (d) regional contexts influence both knowledge creation and knowledge assimilation separately. First published online: 05 Feb 201

    Encuestas a pie de urna en España. ¿Error muestral o sesgo de no respuesta?

    Get PDF
    Countless examples of misleading forecasts on behalf of both pre-election and exit polls can be found all over the world. Non-representative samples due to differential nonresponse have been claimed as being the main reason for inaccurate exit-poll projections. In real inference problems, it is seldom possible to compare estimates and true values. Electoral forecasts are an exception. Comparisons between estimates and final outcomes can be carried out once votes have been tallied. In this paper, we examine the raw data collected in seven exit polls conducted in Spain and test the likelihood that the data collected in each sampled voting location can be considered as a random sample of actual results. Knowing the answer to this is relevant for both electoral analysts and forecasters as, if the hypothesis is rejected, the shortcomings of the collected data would need amending. Analysts could improve the quality of their computations by implementing local correction strategies. We find strong evidence of nonsampling error in Spanish exit polls and evidence that the political context matters. Nonresponse bias is larger in polarized elections and in a climate of fearExiste un gran número de ejemplos de predicciones inexactas obtenidas a partir tanto de encuestas pre-electorales como de encuestas a pie de urna a lo largo del mundo. La presencia de tasas de no-respuesta diferencial entre distintos tipos de electores ha sido la principal razón esgrimida para justificar las proyecciones erróneas en las encuestas a pie de urna. En problemas de inferencia rara vez es posible comparar estimaciones y valores reales. Las predicciones electorales son una excepción. La comparación entre estimaciones y resultados finales puede realizarse una vez los votos han sido contabilizados. En este trabajo, examinamos los datos brutos recogidos en siete encuestas a pie de urna realizadas en España y testamos la hipótesis de que los datos recolectados en cada punto de muestreo puedan ser considerados una muestra aleatoria de los resultados realmente registrados en el correspondiente colegio electoral. Conocer la respuesta a esta pregunta es relevante para analistas y encuestadores electorales, ya que, si se rechaza la hipótesis, las deficiencias de los datos recogidos deberían ser subsanadas en concordancia. Los analistas podrían mejorar la calidad de sus estimaciones mediante la implementación de estrategias de corrección local. En nuestro estudio encontramos una fuerte evidencia de errores ajenos al muestreo en las encuestas a pie de urna en España y constatamos la importancia del contexto político. El sesgo de no-respuesta es mayor en elecciones polarizadas y en un clima de violencia o presión

    Abstención sexual durante la Cuaresma en Andalucía a lo largo del siglo XX y su impacto en la estacionalidad de los nacimientos

    Get PDF
    A religious precept that forbids sexual intercourse during Lent has remained in effect for centuries in Catholic populations. This interdiction produces a decrease of conceptions and a rebound after that period, which are difficult to detect because in populations that did not exercise effective control of the fecundity a peak of conceptions is observed during the spring. At present, this precept has disappeared as a result of a process of erosion that we do not know enough. With the anonymized data of all the people surviving at the beginning of 2003 and born in Andalusia (n=8,397,206), this paper aims to determine the importance of this interdiction and its vanishing process throughout the twentieth century. This process takes part with the decrease of fertility and the erosion of seasonality of births. Finally, it is also analyzed to what extent this transition to modernity is the result of a process of diffusion that has gone from municipalities with a big population size to small municipalities.Durante siglos ha permanecido vigente en las poblaciones católicas un precepto religioso que prohibía las relaciones sexuales durante la Cuaresma. Esta interdicción repercutía en una disminución de concepciones y en un repunte tras dicho periodo, ambos difíciles de detectar porque, en poblaciones que no ejercían un control efectivo de la fecundidad, también se registraba un pico de concepciones durante la primavera. En la actualidad este precepto ha desaparecido como consecuencia de un proceso de erosión que no conocemos suficientemente. Con los datos anonimizados de todas las personas nacidas en Andalucía y supervivientes a 1 de enero de 2003 (n=8.397.206), este trabajo pretende determinar la importancia de esta interdicción y su desaparición a lo largo del siglo XX, coincidiendo con el descenso de la fecundidad y la desestacionalización de los nacimientos. Finalmente, también se analiza hasta qué punto esta transición hacia la modernidad es resultado de un proceso de difusión que ha ido desde municipalidades de gran tamaño poblacional a municipalidades pequeñas

    Determinants of profitability in Spanish financial institutions. Comparing aided and non-aided entities

    Get PDF
    The last financial crisis has led to the greatest contribution of public funds ever made to Spanish banks. This paper studies why the need for support has been asymmetric, with not all of the institutions requiring aid. Based on profitability of assets (ROA), we determine using panel data econometric and logit response models the components of profit and loss accounts that generated profitability as well as the factors leading to some entities to ask for aid. The analyses show that before the beginning of the crisis there were significant differences between entities that needed aid and those that did not. The most profitable banks grounded their success in the traditional revenue components of financial institutions (such as margin on interest rates and commissions), as well as in revenues obtained from participated companies and extraordinary results. The model offers a tool to detect entities in difficulties in advance, reducing the financial and social costs of public interventions. The factors more impacting on profitability of Spanish institutions are also identifie
    corecore